1 Version with iu<^ n ^ 

Show Change \a„a^ 

AUXILIARY INFORMATION GENERATION METHOD, 
AUXILIARY INFORMATION GENERATION APPARATUS, 
VIDEO DATA GENERATION METHOD, VIDEO DATA PLAYBACK METHOD, 
VIDEO DATA PLAYBACK APPARATUS, AND DATA STORAGE MEDIUM 
BACKGROUND. OF THE INVENTION 
FIELD OF THE INVENTION 

1. Field of the Invention 

The present invention relates to an auxiliary information 
generation method, an auxiliary information generation apparatus, 
an image data generation method, and a data storage medium. More 
particularly, the invention relates to a method and an apparatus 
for generating auxiliary information which is used as index 
information when extracting or selecting a part of digital data 
such as digital video and audio data, and a method for generating 
partial video data by extracting a desired portion of image data 
by utilizing the auxiliary information, as well as a method and 
an apparatus for playing the partial video. Further, the 
invention relates to a data storage medium which stores a program 
for making a computer execute the auxiliary information 
generation method and the video data generation method, and data 
generated as the result of executing these methods. 
BACKGROUND OF THE INVENTI ON 

2. Description of the Related Art 

In recent years, with the progress in digitization of video 
and audio, standardization of video and audio data compression 




methods such as MPEG-2 and MPEG-4 has been achieved for the 
purpose of improving efficiency in recording or transmission and, 
furthermore, standardization relating to description of auxiliary 
information, which is used when selecting desired data from a 
database holding these video and audio data or extracting a 
portion of the video data, has been promoted as MPEG-7. 

Hereinafter, an example of description of auxiliary 
information relating to digital data based on MPEG-7 will be 
described with reference to drawings (ISO/IEC JTC 1/SC 29/WG 
11/N3411, "MPEG-7 Multimedia Description Schemes WD (Version 
3.0) 2000.5) . 

As for viewing of video and audio data, it has been common 
practice that contents of video and audio data produced by a 
producer are broadcast and viewed by many people. However, as 
mobile devices such as personal computers and handy phones have 
become widespread, these mobile devices have permitted the users 
to interactively operate video and audio data through the 
Internet or the like, as well as to view these data one-sidedly. 
Accordingly, it is expected that service patterns, which enable 
the users to retrieve only a desired portion of contents from 
data baocs databases scattered on the Internet without viewing all 
of the contents, or enable the providers to select contents 
according to preferences of the users and distribute the contents 
to the users, will become widespread in the future. MPEG-7 is an 
international standard, which is now under standardization, for 
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describing temporal information, contents, preferences of users, 
and the like with respect to multimedia data such as video and 
audio . 

Figurc Fig . 50 shows an example of description based on MPEG- 
7 (excerpt from ISO/IEC JTC 1/SC 29/WG 11/N3410, "MPEG-7 
Multimedia Description Schemes XM (Version 3.0) n , 2000.5). In 
this example, a portion of video data is described by a 
description of VideoSegment , and the VideoSegment corresponds to 
one scene. Auxiliary information is described by using 
MediaTimePoint indicating the start time of this segment, 
MediaDuration indicating the duration of this segment, and 
SegmentDecomposition indicating the presence or absence of a gap 
between segments, and the like. As shown in f iguro f ig . 51, this 
auxiliary information 3003 is added to a header 3002 or the like 
of video data 3000, whereby the user can easily search for or 
extract desired data such as video data. In f igurc f ig . 51, 3001 
denotes a data body corresponding to the video data 3000 
excluding the header 1002. 

In MPEG-7, however, only the description itself of the 
auxiliary information relating to contents information is 
standardized, and a method for generating the auxiliary 
information is not defined. Further, there is no definition 
about what kind of information is to be provided using MPEG-7. 

Meanwhile, with respect to mobile devices such as handy 
phones which have rapidly become widespread or progressed in 
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functions, it is expected that those provided with 
ohooting capturing functions such as cameras or movies will become 
available incxpcnaivcly inexpensive in the future. In this case, 
a ohot captured (moving) picture can be transmitted to a 
destination through a mobile communication network. On the other 
hand, the telephone charge on such video communication is not 
necessarily a fixed charge such as a month-by-month basis. In 
the case of mobile terminals, a pay-per-view basis according to 
the amount of transmitted/ received data is usually employed. 
Therefore, if the whole of a ohot captured (moving) picture is 
transmitted as it is, the communication cost will become high. 

SUMMARY OF THE INVENTION 

The present invention is made to solve the above-described 
problems and has for its object to provide a method and an 
apparatus for generating auxiliary information relating to 
digital data, a method for generating video data, which can 
reduce the cost of transmitting a picture from a mobile terminal, 
a method and an apparatus for playing a part of video data having 
a high degree of importance, and a data storage medium which 
holds a program for executing these methods as well as data 
obtained as the result of executing these methods. 

Other objects and advantages of the invention will become 
apparent from the detailed description that follows. The 
detailed description and specific embodiments described are 
provided only for illustration since various additions and 
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modifications within the scope of the invention will be apparent 
to those of skill in the art from the detailed description. 

According to a first aspect of the present invention, there 
is provided an auxiliary information generation method— 
comprising: which includes, generating auxiliary information 
relating to digital data when the digital data is generated-^ and 
recording or transmitting the generated auxiliary information 
together with the digital data. Therefore, auxiliary information 
can be easily generated at a desired generation timing which is 
not defined in the standard. 

According to a second aspect of the present invention, in the 
auxiliary information generation method of the first aspect, a 
candidate of auxiliary information to be generated is selectedf>_ 
and auxiliary information corresponding to the selected candidate 
is generated. Therefore, auxiliary information can be easily 
generated at a desired generation timing -which is not defined in 
the standard. 

According to a third aspect of the present invention, in the 
auxiliary information generation method of the first aspect, 
generation of the auxiliary information is carried out in 
synchronization with the start or completion of any one of 
inputting, recording, and or transmission of the digital data. 
Therefore, auxiliary information can be easily generated at a 
desired generation timing which is not defined in the standard. 

According to a fourth aspect of the present invention, in the 



auxiliary information generation method of the first aspect, 
generation of the auxiliary information is carried out with, as a 
trigger, user operation performed on a data generation apparatus 
which generates the digital data. Therefore, auxiliary 
information can be generated at a desired generation timing which 
is not defined in the standard. 

According to a fifth aspect of the present invention, in the 
auxiliary information generation method of the first aspect, the 
digital data is video and audio data^x. and the auxiliary 
information includes any of temporal information, contents, and 
degree of importance of the video and audio data. Therefore, any 
of temporal information, title, and degree of importance, which 
are useful as indexes, can be used as auxiliary information which 
is information for searching video and audio data. 

According to a sixth aspect of the present invention, there 
is provided an auxiliary information generation apparatus for 
generating auxiliary information relating to digital data when 
the digital data is generated, and recording or transmitting the 
digital data and the auxiliary information. Therefore, auxiliary 
information can be easily generated at a desired generation 
timing which is not defined in the standard. 

According to a seventh aspect of the present invention, the 
auxiliary information generation apparatus of the sixth aspect 
comprises a CPU which is included in a data generation apparatus 
for generating the digital data. Therefore, auxiliary 
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information can be easily generated at the end of the data 
generation apparatus which generates digital data. 

According to an eighth aspect of the present invention, in 
the auxiliary information generation apparatus of the seventh 
aspect, wherein the data generation apparatus includes a display 
means for displaying the digital data, and the CPU includes: a 
menu display means for displaying, on the display means, a menu 
of auxiliary information which is to be selected by the user of 
the data generation apparatus; a model selection means for 
selecting a model of auxiliary information according to options 
in the menu selected by the user; and a model rewriting means for 
rewriting parameter sections in the selected model according to 
an instruction from the user. Therefore, the user can generate 
auxiliary information according to a menu display, resulting in a 
user-friendly auxiliary information generation apparatus. 

According to a ninth aspect of the present invention, there 
is provided a video data generation method comprioinq : which 
includes reducing the length of digital data including video on 
the basis of auxiliary information relating to the digital data, 
thereby generating reduced digital data-r^ and recording or 
transmitting the reduced digital data. Therefore, reduced 
digital data can be generated considering not only the auxiliary 
information but also the time required for transmission or the 
capacity required for recording, and important video data can be 
generated using the generated auxiliary information, whereby the 
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communication cost and the capacity required for recording are 
reduced. 

According to a tenth aspect of the present invention, in the 
video data generation method of the ninth aspect, the reduced 
digital data is generated by preferentially extracting digital 
data having a high degree of importance, on the basis of the 
auxiliary information. Therefore, it is possible for a produce 
producer of data to generate reduced digital data comprising only 
important segments for selected by the producer. 

According to an eleventh aspect of the present invention, in 
the video data generation method of the tenth aspect, generation 
of the reduced digital data is carried out on the basis of the 
time required for transmission of the digital data or the storage 
capacity required for recording of the digital data, in addition 
to the auxiliary inf ormation-r-^ and the generated reduced digital 
data is transmitted. Therefore, only a portion of original data 
having a high degree of importance can be transmitted to a 
destination. 

According to a twelfth aspect of the present invention, in 
the video data generation method of the tenth aspect, the 
generated reduced digital data is reproduced at the side where 
the digital data is generated. Therefore, only a portion of 
original data having a high degree of importance can be 
reproduced at the end where the digital data is generated. 

According to a thirteenth aspect of the present invention, 
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there is provided a video data generation method comprising : which 
includes reducing the length of digital data including video on 
the basis of auxiliary information relating to the digital data, 
and information relating to transmission, thereby generating 
reduced digital data-^ and recording or transmitting the reduced 
digital data. Therefore, reduced digital data can be generated 
considering not only the auxiliary information but also the 
information relating to transmission, and important video data 
can be generated using the generated auxiliary information, 
whereby the communication cost and the capacity required for 
recording are reduced. 

According to a fourteenth aspect of the present invention, in 
the video data generation method of the thirteenth aspect, the 
information relating to transmission is information about the 
name of a destination. Therefore, reduced digital data can be 
generated considering not only the auxiliary information but also 
the information relating to the name of the destination. 

According to a fifteenth aspect of the present invention, in 
the video data generation method of the thirteenth aspect, the 
information relating to transmission is information about the 
contents to be transmitted. Therefore, reduced digital data can 
be generated considering not only the auxiliary information but 
also the information relating to the contents to be transmitted. 

According to a sixteenth aspect of the present invention, in 
the video data generation method of the thirteenth aspect, the 
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information relating to transmission is information about the 
capability of a terminal at a destination. Therefore, reduced 
digital data can be generated considering not only the auxiliary 
information but also the information about the capability of the 
terminal at the destination. 

According to a seventeenth aspect of the present invention, 
there is provided a video data generation method comprising : which 
includes selecting at least one piece of digital data from plural 
a plurality of pieces of digital data including video, on the 
basis of auxiliary information relating to the digital data and 
information relating to transmission-^ and recording or 
transmitting the selected digital data. Therefore, at least one 
piece of digital data can be selected from plural a plurality of 
pieces of digital data according to the auxiliary information and 
the information relating to transmission to a destination, and 
the selected data can be transmitted to a destination. Thus, 
important video data is generated using the generated auxiliary 
information, whereby the communication cost and the capacity 
required for recording can be reduced. 

According to an eighteenth aspect of the present information, 
in the video data generation method of the seventeenth aspect, 
the information relating to transmission is information about the 
name of a destination. Therefore, at least one piece of digital 
data can be selected from plural a plurality of pieces of digital 
data according to the auxiliary information and the information 
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about the name of the destination, and the selected data can be 
transmitted to the destination. 

According to a nineteenth aspect of the present invention, in 
the video data generation method of the seventeenth aspect, the 
information relating to transmission is information about the 
contents to be transmitted. Therefore, at least one piece of 
digital data can be selected from plural a plurality of pieces of 
digital data according to the auxiliary information and the 
information about the contents to be transmitted, and the 
selected data can be transmitted to the destination. 

According to a twentieth aspect of the present invention, in 
the video data generation method of the seventeenth aspect, the 
information relating to transmission is information about the 
capability of a terminal at a destination. Therefore, at least 
one piece of digital data can be selected from plural a plurality 
of pieces of digital data according to the auxiliary information 
and the information about the capability of the terminal at the 
destination, and the selected data can be transmitted to the 
destination . 

According to a twenty-first aspect of the present invention, 
there is provided a video data playback method comprioing : which 
includes reducing the length of digital data including video, on 
the basis of auxiliary information relating to the digital data, 
thereby generating reduced digital data^ and displaying the 
reduced digital data. Therefore, only a portion of the original 
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data having a high degree of importance can be played at the end 
where the digital data is generated, and the time required for 
the playback can be reduced. 

According to a twenty-second aspect of the present invention, 
there is provided a video data playback apparatus for reducing 
the length of digital data including video, on the basis of 
auxiliary information relating to the digital data, thereby 
generating reduced digital data; and displaying the reduced 
digital data. Therefore, only a portion of the original data 
having a high degree of importance can be played at the end where 
the digital data is generated, and the time required for the 
playback can be reduced. 

According to a twenty-third aspect of the present invention, 
there is provided a data storage medium which stores a data 
processing program for making a computer execute the auxiliary 
information generation method according to the first aspect. By 
using this recording medium, generation of auxiliary information 
can be carried out with a computer. 

According to a twenty-fourth aspect of the present invention, 
there is provided a data storage medium which stores a data 
processing program for making a computer execute the video data 
generation method according to the thirteenth aspect. By using 
this recording medium, generation of reduced video data smaller 
than the original data can be carried out with a computer. 

According to a twenty-fifth aspect of the present invention, 
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there is provided a data storage medium which stores a data 
processing program for making a computer execute the video data 
generation method according to the seventeenth aspect. By using 
this recording medium, a process of generating video data by 
selecting at least one piece of data from the original data, can 
be carried out with a computer. 

According to a twenty-sixth aspect of the present invention, 
there is provided a data storage medium which stores the 
auxiliary information generated by the auxiliary information 
generation method according to a first aspect. Therefore, 
generation of auxiliary information can be carried out with a 
computer and, furthermore, the auxiliary information can be 
stored separately from the original data. 
h According to a twenty-seventh aspect of the present invention, 

there is provided a data storage medium which stores the video 
data generated by the video data generation method according to 
the thirteenth aspect. Therefore, selection of important video 
data from the original video data can be carried out with a 
computer, and the selected video data can be stored separately 
from the original video data. 

According to a twenty-eighth aspect of the present invention, 
there is provided a data storage medium which stores the video 
data generated by the video data generation method according to 
the seventeenth aspect. Therefore, selection of important video 
data from the original video data can be carried out with a 



computer, and the selected video data can be stored separately 
from the original video data. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figurc Fig . 1 is a block diagram illustrating a combined 
camera and digital VTR having an auxiliary information generator 
according to a first embodiment of the present invention. 

Figurc Fig . 2 is a flowchart for explaining the operation of 
a CPU 11 when generating auxiliary information, according to the 
first embodiment. 

Figurc Fig . 3 is a diagram for explaining an example of 
auxiliary information which is generated by the auxiliary 
information generator according to the first embodiment. 

Figurc Fig . 4 is a schematic diagram illustrating a combined 
camera and digital VTR as an example of an image 
recording/playback apparatus. 

Figurc Fig . 5 is a block diagram illustrating an auxiliary 
information generator implemented by the CPU 11. 

Figurc Fig . 6 is a diagram illustrating a question displayed 
according to the menu method. 

Figurc Fig . 7 is a diagram illustrating a displayed question 
and a way for answering the question, according to the menu 
method. 

Figurc Fig . 8 is a diagram illustrating another way for 
answering the question according to the menu method. 

Figurc Fig . 9 is a diagram illustrating an information switch 
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provided on the upper surface of the body of the combined camera 
and digital VTR. 

Figurc Fig. 10 is a diagram illustrating a pressure sensor 
provided on the upper surface of the body of the combined camera 
and digital VTR. 

Figurc Fig. 11 is a diagram illustrating a sweat sensor 
provided on the upper surface of the body of the combined camera 
and digital VTR. 

Figurc Fig. 12 is a block diagram illustrating an example of 
a menu input means . 

Figurc Fig. 13 is a diagram illustrating another example of 
an answer input according to the menu method. 

Figurc Fig . 14 is a block diagram illustrating another 
example of a menu input means. 

Figurc Fig. 15 is a diagram illustrating another example of 
an answer input according to the menu method. 

Figurc Fig. 16 is a block diagram illustrating another 
example of a menu input means. 

Figurc Fig. 17 is a diagram illustrating another example of_ 
an answer input according to the menu method. 

Figurc Fig. 18 is a block diagram illustrating a CPU having a 
button pattern moving means. 

Figurc Fig . 19 is a block diagram illustrating another 
example of a menu input means. 

Figurc Fig. 20 is a diagram illustrating another example of 
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an answer input according to the menu method. 

Figurc Fig. 21 is a diagram illustrating an example of a menu 
screen for selecting a rule for inputting auxiliary data. 

Figurc Fig. 22 is a diagram illustrating an example of a menu 
screen for selecting a target of ohooting capturing . 

Figurc Fig. 23 is a diagram illustrating an example of a menu 
screen for selecting PointOfView and its degree of importance. 

Figurc Fig. 24 is a block diagram illustrating another 
example of a menu input means . 

Figurc Fig. 2 5 is a diagram illustrating an example of a 
screen showing PointOfView and its degree of importance which are 
selected from the menu. 

Figurc Fig. 2 6 is a diagram illustrating another example of a 
screen showing PointOfView and its degree of importance which are 
selected from the menu. 

Figurc Fig. 27 is a diagram illustrating another example of a 
screen showing PointOfView and its degree of importance which are 
selected from the menu. 

Figurc Fig . 28 is a diagram illustrating another example of a 
screen showing PointOfView and its degree of importance which are 
selected from the menu. 

Figurc Fig. 29 is a block diagram illustrating the internal 
structure of a CPU which enables the screen display shown in 
f igurc f ig . 25 . 

Figurc Fig. 30 is a block diagram illustrating the internal 
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structure of a CPU which enables the screen display shown in 
f iqurc f ig . 2 6(a) . 

Figuro Fig . 31 is a block diagram illustrating the internal 
structure of a CPU which enables the screen display shown in 
f iqurc f ig. 26(b). 

Figuro Fig . 32 is a block diagram illustrating the internal 
structure of a CPU which enables the screen display shown in 
f iqurc f ig . 27 . 

Figuro Fig. 33 is a block diagram illustrating the internal 
structure of a CPU which enables the screen display shown in 
f iqurc f ig . 28 . 

Figuro Fig. 34 is a block diagram illustrating the internal 
structure of a CPU which controls the combined camera and digital 
► ■ VTR. 

Figuro Fig. 35 is a diagram illustrating a handy phone 
including an auxiliary information generator according to a 
second embodiment of the present invention. 

Figuro Fig. 36 is a block diagram illustrating the handy 
phone including the auxiliary information generator according to 
the second embodiment. 

Figuro Fig. 37 is a block diagram illustrating a mail 
formation function and a (moving) picture attachment function, of 
the handy phone according to the second embodiment. 

Fiqurc Fig. 38 is a diagram illustrating the state where a 
question is displayed on a liquid crystal display of the handy 
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phone according to the second embodiment. 

Figurc Fig . 39 is a diagram illustrating the state where a— 
question and answer buttons are displayed on the liquid crystal 
display of the handy phone according to the second embodiment. 

Figurc Fig. 40 is a diagram illustrating the state where the 
user puts a finger on the liquid crystal display of the handy 
phone according to the second embodiment. 

Figurc Fig . 41 is a diagram illustrating the state where the 
user applies a pen onto the liquid crystal display of the handy 
phone according to the second embodiment. 

Figurc Fig . 42 is a diagram illustrating the state where the 
user puts a finger on an information switch which is provided on 
the handy phone according to the second embodiment. 

Figurc Fig . 43 is a diagram illustrating the state where the 
user puts a finger on a pressure sensor which is provided on the 
handy phone according to the second embodiment. 

Figurc Fig . 44 is a diagram illustrating the state where the 
user puts a finger on a sweat sensor which is provided on the 
handy phone according to the second embodiment. 

Figurc Fig . 45 is a diagram illustrating the state where the 
user applies a pen onto a touch panel which is provided on the 
liquid crystal display of the handy phone according to the second 
embodiment . 

Figurc Fig. 46 is a diagram illustrating the state where a 
microphone provided on the back of the handy phone according to 
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the second embodiment picks up a sound. 

Figurc Fig. 47 is a flowchart for explaining a picture data 
generation method according to third embodiment of the present 
invention . 

Figurc Fig. 48 is a flowchart for explaining a picture data 
generation method according to the third embodiment of the 
present invention. 

Figurc Fig. 4 9 is a diagram for explaining a recording medium 
on which a program and data for making a computer perform any of 
the aforementioned embodiments are recorded, and a computer 
system. 

Figurc Fig. 50 is a diagram illustrating an example of 
description of picture data according to the prior art. 

Figurc Fig . 51 is a diagram illustrating a section where 
auxiliary information is to be inserted in picture data. 

Figurc Fig. 52 is a diagram illustrating a method of using 
auxiliary information. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS INVENTION 
[Embodiment 1] 

Hereinafter, an auxiliary information generation method 
according to a first embodiment of the present invention , which 
corresponds to Claims 1 and 9, will be described with reference 
to the drawings. 

In advance of describing the auxiliary information 
generation method, a method of using auxiliary information will 
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be described with reference to figures figs. 52 (a) -52(c). As 
shown in figurc f ig. 52(a), scene A of athletic meeting, scene B 
of entrance ceremony, and scene C of travel are successively 
recorded as video data on the same recording medium. 

Amongst these video data, hatched parts Al, Bl, and CI in 
f igurc fig. 52 (b) are given high values of importance as auxiliary 
information. 

From the scenes A (athletic meeting), B (entrance ceremony), 
and C (travel), only the parts Al, Bl, and CI which are given 
high values of importance as auxiliary information are extracted 
and combined to form an extracted scene D, and the extracted 
scene D is stored or transmitted, whereby only the scenes of high 
degrees of importance can be efficiently recorded or transmitted 
as compared with the case where all of the shot captured scenes of 
athletic meeting, entrance ceremony, and travel are stored or 
transmitted as they are 7 — requiting . This results in a reduced 
amount of use in the recording medium and reduced communication 
costs. The extraction of the scenes of high degrees of 
importance may be carried out during or after the 
shoot inq capturing of the scenes A, B, C. 

Further, with respect to music data, since the contents of 
the music data can be easily expressed by adding data indicating 
one phrase of the music as auxiliary information, searching for 
the music data is facilitated. 

Figurc Fig. 1 is a block diagram illustrating a video 



21 



recording/playback apparatus comprising a combined camera and 
digital VTR 1000, which includes an auxiliary information 
generation apparatus according to the first embodiment of the 
present invention . 

In fjqurc f ig. 1, reference numeral 11 denotes a CPU as a 
controller; 12 denotes a recording/playback unit for recording or 
reproducing data in/from a recording medium 13 such as a video 
cassette, an optical disk, a magnetic disk, a memory card, or the 
like; 13 denotes a recording medium on which multimedia data such 
as video data and audio data are recorded; 14 denotes a camera 
for shoot inq capturing a picture, subjecting the picture to light- 
to-electricity conversion, and converting a sound at 
shoot inq capturing into an audio signal; 15 denotes a codec unit 
for performing interface between the camera and a monitor, and 
compressing or decompressing inputtcd input video and audio data; 
16 denotes a motor unit for playing back the scene during 
recording or the video after recording, such as a liquid crystal 
monitor or an electronic or liquid crystal viewfinder; and 10 
denotes an external interface for exchanging data between this 
video recording/playback apparatus and an external device such as 
a PC or the like. 

Figuro Fig . 2 is a flowchart illustrating an example of 
operation when the CPU 11 generates auxiliary information, and 
f iqurc f ig . 3 is a diagram for explaining an example of auxiliary 
information generated by the auxiliary information generation 



apparatus . 

Further, figures figs. 4(a) and 4(b) are schematic diagrams 
illustrating a combined camera and digital VTR as an example of 
the video recording/playback apparatus. 

In the combined camera and digital VTR, after a 
power/operation mode switch (power switch) 105 is turned on, a 
recording button ( ohooting capturing switch) 104 is turned on, 
whereby the CPU 11 shown in figuro f ig. 1 puts the combined camera 
and digital VTR into recording mode. A picture of a subject, 
which is formed by a lens 100 of the camera 14, is converted into 
a video signal by a light-to-electricity conversion unit (not 
shown) such as a CCD included in the combined camera and digital 
VTR, and a sound at ohooting capturing is converted into an audio 
signal by an internal microphone 102. These video signal and 
audio signal are encoded by the codec unit 15 shown in f iguro f ig . 
1, and converted into a format for recording by the 
recording/playback unit 12, and thereafter, recorded on the 
recording medium 13 such as a video cassette tape, an optical 
disk, or the like. During the recording, the video signal 
outputtcd output through the codec unit 15 is displayed by the 
monitor 103 (i.e., an electronic or liquid crystal viewfinder 101 
or a monitor 103 such as a liquid crystal monitor, which are 
incorporated in the combined cameral and digital VTR) , whereby 
the user can confirm whether a picture is actually recorded or 
not, and what is the recorded picture like. 
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After ohooting capturing a desired picture, when the user 
operates the power/operation mode switch (operation mode switch) 
105, the CPU 11 puts the combined camera and digital VTR into 
fast-rewind mode, and detects the recording start position on the 
recording medium. Thereafter, when the user operates the 
power/operation mode switch (operation mode switch) 105, the CPU 
11 puts the combined camera and digital VTR into playback mode. 
Thereby, the recording/playback unit 12 reads the video data of 
the shot picture from the recording medium 13, the codec unit 15 
decodes the video signal, and the monitor 103 plays the video. 
Thereby, the combined camera and digital VTR performs playback 
with the monitor 103, and the user can enjoy the ohot captured 
picture with the screen of the monitor 103 which is wider than 
the viewfinder 101. Further, the user can easily confirm whether 
the desire picture has been shot or not, or search for a point of 
time where auxiliary information is to be inputtcd input , or 
conform whether scenes extracted from the video data on the basis 
of the auxiliary information are as intended or not. When the 
extracted scenes are not as intended, the user can easily perform 
an editing operation such as re-inputting of the auxiliary 
information . 

Furthermore, reference numeral 108 denotes a buttery battery 
pack for driving the combined camera and digital VTR; 107 denotes 
a cassette rid lid which covers a portion where a video cassette 
tape is mounted on the combined camera and digital VTR; 106 
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denotes a grip belt which supports a hand of the user holding the 
VTR at ohootinq capturing ; and 109 denotes an information button 
for inputting auxiliary information. 

Although it is not shown in f iguro f ig . 4, the played video 
can be displayed on an external wider monitor which is connected 
to the combined camera and digital VTR through the external 
interface 10 shown in f iguro fig. 1, or the video data recorded on 
the recording medium can be transmitted to a personal computer 
through the external interface 10 to be edited on the personal 
computer . 

Hereinafter, an auxiliary information generation method will 
be described taking, as an example, the video recording/playback 
apparatus having the auxiliary information generation apparatus 
constructed as described above, with reference to figured figs. 1, 
2 , and 3 . 

The combined camera and digital VTR having the auxiliary 
information generation apparatus shown in f igurc f ig . 1 has a 
construction similar to that of an ordinary combined camera and 
digital VTR, as shown in f iguro fig. 4 . In the combined camera 
and digital VTR, a picture shot captured by the camera 13 is 
monitored by the monitor 16 and, simultaneously, it is compressed 
by the codec unit 15, and recorded on the recording medium 13 
through the recording/playback unit 12. 

This combined cameral and digital VTR is different from the 
conventional one only in the operation of the CPU 11, and the CPU 
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11 can generate auxiliary information on the basis of control 
signals which are supplied from the user information button 109, 
the ohooting capturing switch 104, and the power switch 105. That 
is, the CPU 11 corresponds to the auxiliary information 
generation apparatus. 

Figuro Fig . 5 is a block diagram illustrating the 
construction of the auxiliary information generation apparatus 
implemented by the CPU 11. In f igurc fig. 5, reference numeral 
110 denotes a menu-basis auxiliary information storage means 
which stores plural a plurality of m odels of auxiliary 
information corresponding to different menus; 111 denotes an 
auxiliary information model selection means for selecting one of 
the auxiliary information models stored in the menu-basis 
auxiliary information storage means 110, according to an 
instruction from a menu input means 116; 112 denotes an auxiliary 
information model rewriting means for rewriting the auxiliary 
information model selected by the auxiliary information model 
selection means 111, according to parameters supplied from a 
parameter input means 115; 113 denotes a recording timing control 
means for controlling the timing of recording the auxiliary 
information, according to an input from an auxiliary information 
recording timing input means 117; 114 denotes an auxiliary 
information storage means for writing the auxiliary information 
model which has been rewritten by the auxiliary information model 
rewriting means 112, into the recording medium, under control of 
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the recording timing control means 113. 

The auxiliary information model selection means 111, the 
auxiliary information model rewriting means 112, the recording 
timing control means 113, and the auxiliary information recording 
means 114 are parts implemented by the CPU 11 itself, the menu- 
basis auxiliary information storage means 110 is a part 
implemented by a ROM (not shown) included in the CPU 11, and the 
parameter input means 115, the menu input means 116, and the 
auxiliary information recording timing input means 117 are parts 
implemented by the user information buttons 109, the recording 
button 104 as the ohooting capturing switch, the power /operation 
mode switch 105 as the power switch, and the like. 

Figurc Fig . 2 shows the operation of the CPU 11 when 
generating auxiliary information relating to digital data. It is 
assumed that the CPU 11 operates even in the stand-by state where 
the power/operation mode switch 105 of the combined camera and 
digital VTR is OFF. 

Initially, when the user turns on the power/operation mode 
switch 105 of the combined camera and digital VTR (step 21), the 
CPU 11 is notified that the power is turned on. Thereby, it is 
set by default that auxiliary information is to be inputtcd input . 

Next, whether the type of auxiliary information to be 
generated should be selected or not is inputtcd input by 
combination of the operations of the switches such as the 
recording button 104, the power/operation mode switch 105, and 
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the like (step 22) . This selection may be performed by a menu 
method, that is, by making a question to the user with a menu 
displayed on the monitor 103. To be specific, the CPU 11 outputs 
a question to the monitor 16 as shown in f igurc f ig. 6. When a 
touch panel 103a is provided as shown in f igurc f ig. 7, the CPU 11 
outputs answer buttons 103b to the question, and displays the 
answer buttons on the monitor 16. At this time, the CPU 11 
searches the ROM which stores the question. Instead of inputting 
the auxiliary information by default, a question to the user may 
be made by the menu method to obtain an answer to the question 
from the user. 

The user answers the question as follows. That is, as 
shown in f igurc f ig . 8, the user selects a menu button by 
performing, with his/her thumb, a combination of switching 
operations of the recording button 104, the power/operation mode 
switch 105, and the like, which are provided on the rear right 
side of the body of the combined camera and digital VTR, by a 
predetermined number of times, in a predetermined order. 
Alternatively, as shown in f igurco figs . 9, 10, e^- and 11, an 
information switch 109, e^-a pressure sensor 109a, e^-and_a sweat 
sensor 109b may be provided on the upper surface of the body, and 
the user may select a menu button by pressing it with the fingers 
of the hand that grips the combined camera and digital VTR. When 
using a sensor, as shown in figurc f ig. 12, it is necessary to 
normalize the sensor output by a sensor output normalization unit 
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116a, compare the normalized sensor output with a threshold which 
is generated by a threshold generation unit 116b, by a comparator 
116c, and then output the comparison result to the auxiliary 
information pattern selection means 112. 

Furthermore, when the liquid phase monitor 103 is provided 
with a touch panel 103a as shown in f igurc f ig . 7, the user may 
select an answer by putting a finger F on an option button 103b 
displayed on the liquid crystal monitor. Further, as shown in 
f igurc f ig . 13, the user may select an answer by applying a pen P 
such as a plastic pen to the touch panel. In these cases, as 
shown in f igurc f ig ■ 14, irnr -within the menu input means 116, the 
coordinates of the portion on the panel which is pressed by the 
finger F or the pen P are supplied from the touch panel 103a to a 
coordinate position input means 116d, and a position comparison 
means 116e compares the coordinates with the positions where the 
option buttons 103b are displayed, which positions are supplied 
from the CPU 11, whereby the selected option button is informed 
to the auxiliary information model selection means 111. Further, 
as shown in figurc fig. 15, a selected option may be inputtcd input 
by hand-writing an answer to the question with a pen on the touch 
panel 103a, and automatically recognizing this answer. In this 
case, as shown in f igurc f ig . 16, the coordinates of the portion 
on the touch panel 103b, which portion is pressed by the finger F 
or the pen P, are supplied from the touch panel 103a to the 
coordinate position input means 116d. Then, a pattern matching 
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means 116f recognizes the hand-written character string according 
to a standard character pattern which is similar to the input 
characters as a trail of points pressed by the finger or pen. 
Then, an answer candidate collation means 116g collates the 
candidates of answers to the question, which candidates are 
issued from the CPU 11, with the recognized character string to 
judge whether the answer is appropriate or not. When the answer 
is appropriate, the answer is outputtcd output to the auxiliary 
information model selection means 111. 

Furthermore, as shown in f iguroo f igs . 17(a) and 17(b), 
option buttons 101a may be displayed in the viewfinder 101. In 
this case, the option (menu) buttons are automatically contrast- 
inverted (highlighted) one by ono one-by-one , and when the 
highlighted option button matches the option the user desires, 
the user selects this option button by appropriately combining 
the operations of the recording button 104, the power/operation 
mode switch 105, and the like as shown in f igurc f ig . 8. 
Alternatively, the user may select an option button by pressing 
the information switch 109, or the pressure sensor 119a, or the 
sweat sensor 119b, which is provided at the upper surface of the 
body as shown in f iguroo f igs . 9, 10, or 11, with the fingers of 
the hand that grips the video camera. The successive contract 
contrast inversion of the option buttons is realized as follows. 
That is, as shown in f igurc f ig . 18, a button pattern formation 
means 11a, a button pattern inversion means lib, and a button 
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designation means 11c are implemented by software or the like in 
the CPU 11, and the contrast of a pattern of an option button 
which is generated by the button pattern formation means 11a is 
inverted by the button pattern inversion means lib. At this time, 
the option buttons to be contract contrast - inverted (highlighted) 
are designated one by onc one-by-one by the button designation 
means 11c, whereby successive contrast inversion of the option 
buttons is realized. The button pattern formation means 11a, the 
button pattern inversion means lib, and the button designation 
means 11c may be implemented by hardware outside the CPU 11. 
When the user lightly presses his/her eye onto a pad 101b of the 
viewfinder 101 as shown in f igurc f ig . 10, or winks as shown in 
f igurc fig. 11, a pressure sensor (not shown) embedded in the pad 
101b surrounding the viewfinder 101 senses this, whereby the 
corresponding option button is designated. In this case, in the 
menu input means 116, as shown in f igurc f ig . 19, a pressure 
sensor output input means 116i inputs the output from the 
pressure sensor, and an input button decision means 116j receives, 
through a button position input means 116h, information 
indicating an option button which is currently highlighted 
according to the output from the button designation means 11c. 
While the option buttons are highlighted ono by onc one-by-one , 
when the user operates the pressure sensor by lightly pressing 
his/her eye onto the pad 101b at an option button he/she desires, 
the input button decision means 116j decodes this button as an 
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input button. Alternatively, a light-emitting element 116X and a 
light sensor 116Y which are included in the viewfinder 101 as 
shown in f iquro fig. 20 may be used instead of the pressure sensor. 
In this case, the light-emitting element 116X applies a weak 
light to the user's eye, and the light sensor 116X senses the 
reflected light from the eye to detect whether the user opens the 
eye or not, and the output from the light sensor is inputtcd input 
to the sensor output input means 116i, thereby designating the 
corresponding option. 

When the user does not select the type of auxiliary 
information, the instruction of inputting auxiliary information 
is canceled. On the other hand, when the user makes an 
instruction to select the type of auxiliary information, 
selection is carried out in step 23. At this time, the user may 
select a description to be used from descriptions of auxiliary 
information based on MPEG-7 or the like. However, it is 
difficult for the ordinary users to understand and execute such 
selection because the ordinary users are not expected to have 
knowledge about MPEG-7 and XML. Therefore, as shown by a menu on 
the monitor ( f igurc f ig. 21(a)) or a menu in the viewfinder 
( fiqurc fig. 21(b)), the rule of inputting auxiliary information 
is selected from a menu having optiono ao follows: rccording the 
following options. Recording auxiliary information for every 
important scene, inputting auxiliary information to a header or 
the like every time the ohooting capturing location is changed 
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(this operation is achieved by combination with the 
power/operation mode switch) , and inputting auxiliary information 
at predetermined intervals. The processes and circuits required 
at this time are implemented by the same method as the menu 
method for deciding whether the type of auxiliary information 
should be selected or not. Alternatively, as shown by a menu on 
the monitor ( f iguro fig. 22(a)) or a menu in the viewfinder 
( figuro fig. 22 (b) ) , the type of auxiliary information may be 
selected, according to the purpose, from a menu having options 
such as snap ohootinq capturing , storage/record, athletic meeting, 
entrance ceremony, etc., and the selected auxiliary information 
may be converted into a set of descriptions based on MPEG-7 or 
the like in the CPU 11. Although VideoSegement is used as the 
set of descriptions, the viewpoint can be changed according to 
each purpose . 

The above-mentioned selection can be realized by storing 
description sets based on MPEG-7 or the like in a ROM (not shown) , 
and selecting the contents in the ROM by the CPU 11 according to 
the menu option selected by the user. 

Turning to figuro fig. 2, when ohootinq capturing is started 
(step 24), a description of VideoSegment indicating a part of 
video is recorded as auxiliary information. During 
ohootinq capturing , when the user desires to record auxiliary 
information, for example, when an important scene or a scene to 
be emphasized is being ohot captured or at the instant when the 
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scene is changed, the user presses the information button 109, 
and the CPU 11 detects it, generates auxiliary information, and 
records the auxiliary information. While in the example shown in 
f igurc f ig . 2 a description of VideoSegment and a description of 
importance are recorded in steps 27 and 28, respectively, the 
present invention is not restricted thereto. The auxiliary 
information selected in step 23 may be recorded. Furthermore, a 
plurality of information buttons, which are respectively assigned 
to different kinds of auxiliary information, may be provided on 
the body of the combined camera and VTR. In this case, the user 
presses any of these information buttons to record the 
corresponding auxiliary information. This operation is repeated 
until shoot ing capturing is completed. 

Figurc Fig . 3 shows an example of auxiliary information 
generated according to the first embodiment, and MPEG-7 is 
employed for the description. In this first embodiment, in 
contrast with the conventional method, XML description is 
employed. In f igurc f ig . 3, a description of Medialnf ormation 
between descriptors 301a and 301b describes the entire 
information of this video data. That is, it indicates that the 
file format of this video data is MPEG4 . SegmentDecomposition 
shown by a descriptor 302 indicates that the VideoSegment, which 
is a part of the video data, is temporal, i.e., that the 
VideoSegment is arranged temporally. The above-mentioned 
description is generated and recorded after the power is turned 
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on or before Segment description is started. In f igurc f ig . 3, a 
description between descriptors 303a and 303b is a description 
relating to one segment (= one scene) , and it indicates that one 
scene continues for 1M19S (i.e., one minute and nineteen seconds) 
at 30F (i.e., 30 frames per sec). The VideoSegment is followed 
by generated data, and a title indicated by a descriptor 304 and 
the like should be inputtcd input not during shoot ing capturing but 
after ohooting capturing . A description of PointOfView between 
descriptors 305a and 305b indicates the degree of importance, and 
it is expressed by a value for each Viewpoint as shown by a 
descriptor 306. 

Generally, PointOfView indicates a point for discrimination 
from another object. The user may input all of the auxiliary 
information by XML description, or XML description may be 
automatically generated by preparing plural a plurality ofm enus 
on assumption of user conditions, and selecting a menu most 
appropriate to the ohooting capturing condition from the menus. 
For example, when ohooting capturing a scene in which a son and a 
daughter appear in an athletic meeting, as shown in f igurc f ig . 
23(a) (menus on the monitor) or f igurc f ig . 23(b) (menus in the 
viewfinder) , in a menu of "athletic meeting", buttons 103m, 103n, 
101m, lOln corresponding to tags of "son", "daughter", 
" cxciting excitement " are prepared in advance, and a value, i.e., 
the degree of importance, is shown according to the time length 
of a scene where the son and the daughter appear. Further, 
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" Gxciting excitement " means a climax, and this description can be 
recorded by operating the information button or the like provided 
on the equipment. The simplest method of setting the value of 
Gxciting excitement is setting the value of importance at either 
"0" (= not important) or "1" (= most important) . However, one of 
the value buttons shown in the menu on the monitor or the menu in 
the viewfinder may be selected by the menu method, i.e., in the 
same manner as that described for the case of answering to a 
question as to whether the type of auxiliary information should 
be selected or not. At this time, values in increments of "0.1" 
may be input tcd input between "0" and "1" by combination of 
pressing the power key and the recording key, and one of these 
values may be selected. Alternatively, when an exciting 
excitement button is provided at the upper surface of the body of 
the combined camera and VTR, the user may input a value of 
Gxciting excitement by operating this button. Furthermore, a 
value of Gxciting excitement may be inputtcd input by sensing the 
degree of Gxciting excitement of the user from the fingers of the 
user which are put on a pressure sensor or a sweat sensor 
provided at the upper surface of the body. Furthermore, as shown 
in f igurc f ig . 24, the loudness of cheers or the loudness of 
user's voice at shoot ing capturing may be measured, and the value 
of Gxciting excitement may be inputtcd input according to the 
measured value. In this case, the level of an audio signal 
obtained by a microphone (not shown) of the camera 14 is detected 
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by an audio signal level detecting means 116k, and this level is 
normalized by an audio signal level normalization means 1161. 
Then, a comparison means 116n compares the normalized level with 
plural a plurality of thresholds which are generated by a 
threshold generator 116m, whereby a value to be designated as a 
menu input can be selected automatically. 

As described above, since auxiliary information is generated 
in connection with the ohooting capturing operation, the user can 
easily generate auxiliary information at ohooting capturing . 
Further, PointOfView may be inputtcd input after ohooting capturing . 

The degree of importance or the value of exciting inputted 
excitement input, as described above^_ can be displayed singly on 
the liquid crystal monitor or the view pointer, or it can be 
superimposed on the monitor picture during ohooting capturing , 
whereby the user can confirm the inputtcd input value. Further, 
an icon corresponding to the contents of a message may be colored 
with a color according to the degree of importance, or the 
lightness in color of the icon may be increased according to the 
value, whereby the user can confirm the inputtcd input value. 

For example, as shown in f igurc f ig . 25, the importance of 
specified persons or the importance of exciting excitement may be 
displayed on the monitor 103 or the viewfinder 101 by switching 
the screen from the scene being ohot captured , or it may be 
superimposed on the scene being e^e^ captured . Furthermore, as 
shown in f igurc f ig . 27, specified persons may be indicated by 
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icons S and D, and the importance of cxciting excitement may be 
indicated by the color of the icons. Figurc Fig. 27(a) shows the 
case where the degree of importance is low, and f iguro f ig . 27(b) 
shows the case where the degree of importance is high. Further, 
as shown in f igurc f ig . 28, the importance of cxciting excitement 
may be indicated by the brightness in color of the icons. 
Figurc Fig . 28(a) shows the case where the degree of importance is 
low, and f igurc f ig. 27(b) shows the case where the degree of 
importance is high . 

The screen display shown in f igurc f ig . 25 is realized as 
follows. As shown in f igurc f ig . 29, the CPU 11 is provided with 
a display message generation means lid, a display message 
rewriting means lie, and a screen output means llf, and 
parameters in a typical display message which is generated by the 
display message generation means lid (in f igurc f ig. 25, "son", 
"daughter", "0.7") are rewritten by the display message rewriting 
means lie and, thereafter, the rewritten display message is 
output tcd output to the monitor 16 by the screen output means llf. 
The display message generation means lid, the display message 
rewriting means lie, and the screen output means llf may be 
implemented by hardware outside the CPU 11. A screen display 
shown in f igurc f ig . 26(a) is realized as follows. As shown in 
f igurc f ig . 30, a superimposing means llg is placed between the 
display message rewriting means lie and the screen output means 
llf shown in f igurc f ig. 29, and a message "persons: son and 
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daughter, degree of importance = 0.7" is superimposed on the 
picture being shot captured , which is outputtcd output from the 
codec unit 15. Further, a screen display shown in f igurc f ig . 
26(b) is realized as follows. As shown in f igurc f ig . 31, the 
picture being ohot captured , which is output tcd output from the 
codec unit 15, is scaled down by a screen scale-down means llh, 
and a message screen outputtcd output from the display message 
rewriting means lie is moved to the lower part of the monitor 
screen by a message moving means Hi. Then, these screens are 
composited by a screen composition means 11 j, and the composite 
screen is outputtcd output to the monitor 16 by the screen output 
means llf . The screen display shown in f igurc fig. 27 is realized 
as follows. As shown in f igurc f ig . 32, an icon corresponding to 
a message is selected by an icon selection means 111 from an icon 
storage means Ilk which stores plural a plurality of icons 
corresponding to the menu, and the selected icon is colored 
according to the degree of importance by an icon coloring means 
11m, and the colored icon is outputtcd output to the monitor 16 by 
the screen output means lln. Further, the screen display shown 
in f igurc f ig. 28 is realized as follows. As shown in f igurc f ig. 
33, using an icon contrast changing means Ho instead of the icon 
coloring means 11m shown in f igurc f ig . 32, the contrast of the 
icon is changed according to the degree of importance. 

By the way, in the example shown in f igurc f ig . 3, since the 
appearance time of "son" is only "0.2" in the initial video 
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segment SegO while the appearance time of "daughter" is "0.6", 
this scene lacks interest, and therefore, the value of 
cxciting excitement is set at "0.1". In the next video segment 
Segl, since both "son" and "daughter" appear for the same 
appearance time "0.6", this scene is interesting, and therefore, 
the value of cxciting excitement is set at "0.8". Although, in 
the above description, the degree of importance is the appearance 
time in one scene, it is also possible to express the degree of 
importance by the value of cxciting excitement or the value of 
"son" or "daughter" (frequency of appearance) . Further, the 
degree of importance may be determined by combining the sizes of 
"son" and "daughter" on the screen and the values of plural a 
plurality of viewpoints . Further, although "son" and "daughter" 
are selected as a sub menu of a menu "athletic meeting", these 
may be selected as examples of viewpoints, and this selection can 
be executed by selecting a value button in PointOfView. 

As described above, since a menu is selected according to 
the shoot ing capturing condition and required parameters are 
selected from the menu, auxiliary information can be generated 
without necessity of knowledge about XML rule, and the generated 
auxiliary information can be attached to the original 
ohot captured data. 

That is, when the user selects a menu through the menu input 
means 116 shown in f igurc f ig . 5, the auxiliary information 
pattern selection means 111 selects one of the auxiliary 
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information patterns which correspond to different menus and are 
stored in the menu-basis auxiliary information storage means 110. 
In the example of fiquro fig. 3, <PointOfView Viewpoint="son"> / 
<PointOfView Viewpoint="daughter">, <PointOfView 
Viewpoint="exciting">, ... correspond to the patterns. In the 
auxiliary information pattern corresponding to the selected menu, 
a portion corresponding to a variable should be rewritten. That 
is, this portion is selected by selecting a tag prepared in the 
menu, and the auxiliary information pattern rewriting means 112 
changes the variable in the auxiliary information pattern to the 
information specified by the user, according to the tag, thereby 
completing the auxiliary information. In the example of 
f iqurc f ig. 3, the variable is <Value>, and this is changed to the 
value specified by the user (e.g., "0.6", "0.8", etc.). 
Thereafter, as shown in f igurc fig . 5, the auxiliary information 
recording means 114 records the completed auxiliary information 
in the header section of a scene or the like according to a 
timing specified by the user through the auxiliary information 
recording timing input means 117. The recording timing control 
means 113 controls the recording means 114 so that the auxiliary 
information is recorded in the header section of a scene 
corresponding to the timing specified by the user. 

Hereinafter, a description will be given of the case where 
the combined camera and VTR reproduces the recorded video data, 
using the auxiliary information recorded as described above. As 
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shown in f igurc f ig . 34, an auxiliary information detection means 
lip detects the auxiliary information from a signal which is 
reproduced from the recording medium by the recording/playback 
unit 12, and an auxiliary information judgement means llq judges 
whether or not the degree of importance of the auxiliary 
information is larger than a value set by the user, for example 
"0.5". When the degree of importance is larger than the set 
value, a recording medium drive control means llr sets the 
recording medium playback speed of a recording medium drive means 
(not shown) at "normal playback", and a playback control means 
lis controls the codec unit 15 so as to decode a signal 
reproduced at this time. On the other hand, when the degree of 
importance is smaller than "0.5" which is set by the user, the 
recording medium drive control means llr sets the recording 
medium playback speed of the recording medium drive means (not 
shown) at "fast-forward mode", and the playback control means lis 
controls the codec unit 15 so as not to decode the reproduced 
signal, whereby playback skipping of a section having a low 
degree of importance is achieved. 

As described above, according to the first embodiment of the 
present invention, in the combined camera and digital VTR, 
parameters in a model of auxiliary information can be inputted 
b yinput according to a method which is familiar to the ordinary 
users, without premising knowledge about MPEG-7 which cannot be 
expected from the ordinary users. For example, the parameters 
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are inputtcd input by combining operations of the recording switch, 
the power button, and the like which are operated in 
synchronization with ohootinq capturing operation by the user, by 
operating the buttons of menu options which are displayed on the 
monitor, by operating the information switch which is provided on 
the body of the VTR, by touching or pen-writing on the liquid 
crystal monitor, or by sensing the user's eye against the 
viewfinder. Thereby, the user can input the auxiliary 
information, and can easily obtain index information when the 
ohot captured moving picture is played back later. 

While in this first embodiment whether auxiliary information 
should be selected or not is decided by the user every time the 
power button is turned on, the present invention is not 
restricted thereto. Whether auxiliary information should be 
selected or not may be set independently of turn-on of the power 
button. For example, in "VideoSegment" , only the values such as 
"View" which are likely to change during shoot ing capturing may be 
generated at ohooting capturing while other values are generated 
in advance of ohooting capturing . Also in this case, the same 
effects as described above are achieved. Further, while in this 
first embodiment auxiliary information is generated in 
synchronization with the ohooting capturing button, there is a 
case where a camera or the like is set such that the power is 
turned off manually or automatically for long battery life. In 
order to cope with such case, auxiliary information may be 
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generated in synchronization with the on/off of the power. 

Further, while in this first embodiment the descriptions of 
XML or the like are explained for only several kinds of 
parameters , the present invention is not restricted thereto, and 
it is possible to select desired parameters from a menu according 
to the purpose. Further, although the first embodiment is 
described on the premise that a ohot captured picture is recorded, 
a ohot captured picture is not necessarily recorded, and it can be 
used also when compressed video and audio data are transmitted as 
they are to be used on a network or the like. Furthermore, 
although auxiliary information is generated at ohooting capturing , 
it is also possible to generate auxiliary information at playback 
by using the information button at the time of playback, i.e., 
when playing a picture ohot captured by the combined camera and 
VTR itself to confirm the picture. When recording or 
transmitting the generated auxiliary information, it is decided, 
according to the construction of the device or system, as to 
whether the auxiliary information should be recorded/transmitted 
after being multiplexed in the ohot captured video and audio data, 
or it should be stored in another place for 

recording/transmission so that the auxiliary information can be 
recorded/transmitted independently of the ohot captured video and 
audio data. 

Furthermore, it is also possible to detect a section in the 
video data corresponding to a degree of importance specified by 
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the user, and reproduce only this section to be displayed on the 
monitor. Therefore, the CPU can extract only data having a high 
degree of importance from the video data recorded on the 
recording medium, and reproduce the extracted data, whereby 
confirmation of the recorded data by the monitor can be carried 
out with efficiency. Further, the user can enjoy the recorded 
"work" without feeling tired, and the power consumption is 
reduced to secure more driving time. 
[Embodiment 2] 

Hereinafter, an image data generation method according to a 
second embodiment of the present inventionT — which corrcopondo to 
Claim 0, will be described with reference to the drawings. 

The image data generation method according to the second 
embodiment will be described taking, as an example, a case where 
an auxiliary information generation apparatus is contained in a 
handy phone having a movie function or a camera function. 

Figurc Fig . 35 shows a handy phone having a movie function or 
a camera function. The handy phone has a lens 2 06 for ohooting 
e ncapturing the back of its body, and a light-to-electricity 
converter (not shown) inside the body, whereby a ohot captured 
(moving) picture can be attached to a mail or the like, and 
transmitted through a mobile communication network. The sound 
during picture ohooting capturing is received by a microphone 2 07. 
In f igurc f ig . 35, reference numeral 200 denotes a microphone for 
conversation, 201 denotes a ten key, 202 denotes a function key, 
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203 denotes a liquid crystal display, 204 denotes a speaker for 
conversation, and 205 denotes a whip antenna. 

Figurc Fig . 36 is a block diagram of a handy phone 200 having 
an auxiliary information generation apparatus (CPU 11) according 
to the second embodiment of the invention. This handy phone has 
a movie function or a camera function. In f igurc f ig ■ 36, the 
same reference numerals as those shown in f iguro f ig . 1 denote the 
same or corresponding parts. Further, reference numeral 19 
denotes an antenna, 18 denotes an RF front end for processing a 
high-frequency signal which is received/transmitted by the 
antenna 19, and 17 denotes a modulation/demodulation unit for 
modulating/demodulating the sound from the microphone, and 
outputting an audio signal to the speaker. 

The operation of the handy phone from generation of 
auxiliary information to generation of video data is identical to 
that described for the combined camera and digital VTR having the 
auxiliary information generator according to the first embodiment 
of the invention. In the handy phone, auxiliary information 
relating to digital data is generated and attached to (moving) 
picture data shot captured by the camera function during 
ohooting capturing or after shooting capturing , and only a portion 
of the digital data having a high priority is extracted using the 
auxiliary information to generate reduced digital data, and the 
reduced digital data so generated is attached to a mail or the 
like to be transmitted, whereby the communication cost is reduced. 
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Since the function of generating and adding auxiliary 
information and the playback operation by itself are identical to 
those described for the first embodiment, repeated description is 
not necessary. However, the recording medium 13 shown in 
f igurc f ig. 36 is limited to a semiconductor memory and, in this 
case, the recording medium drive control means llr is a memory 
address jumping means. Further, the playback control monitor 16 
is limited to a liquid crystal display. 

Figurc Fig . 37 is a block diagram illustrating the 
construction for implementing mail formation and (moving) picture 
attachment. To be specific, in f igurc f ig . 37, when the user 
operates the ten key 201, a mail document formation means lit 
forms a mail document by appropriately selecting characters and 
numbers from plural a plurality of characters and numbers which 
are assigned to each key. As described above, auxiliary 
information is added to picture data which has been obtained by 
ohooting capturing. a picture with the lens 206 and subjecting the 
picture to light-to-electricity conversion by the light-to- 
electricity converter (not shown) such as a CCD. A transmission 
picture data formation means llu forms picture data to be 
transmitted (hereinafter, referred to as transmission picture 
data) by extracting, from the obtained picture data, only a 
section where the degree of importance of the auxiliary 
information is higher than a predetermined value. A data 
attachment means llv converts this transmission picture data into 
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a file to be attached to the mail document. A protocol 
conversion means llw converts the protocol so that the mail 
document to which the transmission picture data is attached can 
be output tcd output to a network such as the Internet. 

In this second embodiment, decision as to whether auxiliary 
information should be generated or not, decision as to whether 
auxiliary information should be selected or not, selection of 
persons, inputting of input the value of importance of 
cxciting excitement , and the like are carried out in the same 
manner as described for the combined camera and VTR according to 
the first embodiment. To be specific, when a menu is displayed 
on the liquid crystal display, the user selects an option button 
with the function key or the user information button, or the user 
selects an option button by putting a finger on a portion 
corresponding to the option button on a touch panel which is 
provided on the liquid crystal display. Alternatively, when only 
a question is displayed on the liquid crystal display, the user 
inputs an answer to this question by using the ten key or the 
function key, or the user inputs an answer by hand-writing on the 
touch panel, or the user selects an answer by putting a finger on 
a pressure sensor or a sweat sensor which is provided on the body 
of the handy phone. Further, an answer to the question may be 
selected by detecting the level of cheers or user's voice which 
is picked up by the microphone. 

That is, also in the handy phone, it is possible to input 



f » 

48 



auxiliary information by combination of operations of the 
recording switch, the power button, and the like which are 
assigned to the ten key 201 and the function key 202 operated in 
synchronization with the ohootinq capturing operation of the user. 
At this time, a question is displayed on the liquid crystal 
display 203 as shown in f igurc f ig . 38, or answer buttons to the 
question are displayed on the touch panel 203a as shown in 
f igurc f ig . 39, and the user can select an answer to the question 
by applying the finger 4 or pen P onto a portion corresponding to 
a desired menu button displayed on the liquid crystal display 203 
as shown in f igurc f ig . 40 or 41. Alternatively, it is possible 
to input auxiliary information by operating the information 
switch 209 provided on the body of the handy phone as shown in 
f igurc f ig . 42, or by sensing the user's hand holding the body 
with the pressure sensor 209a or the sweat sensor 209b shown in 
f igurc f ig ■ 43 or 44, or by direct hand-writing onto the touch 
panel of the liquid crystal display 203 as shown in f igurc f ig . 45, 
or by detecting the level of the cheers or the user's voice which 
is picked up by the conversation microphone 207. In this way, 
the user can easily input auxiliary information by inputting some 
parameters using any of the above-mentioned methods which are 
familiar to the ordinary users, without premising knowledge about 
MPEG-7 that cannot be expected from the ordinary users, and 
furthermore, the user can easily obtain index information 
(auxiliary information) when the ohot captured moving picture is 
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played back later. 

As described above , according to the second embodiment of 
the present invention, in the handy phone, auxiliary information 
is inputtcd input by inputting some parameters using a method that 
is familiar to the ordinary users, without premising knowledge of 
MPEG-7 that cannot be expected from the ordinary users , — which 
method is user. The method being , for example, a combination ofj_ 
(1) operations of the recording switch, the power button, and the 
like which are assigned to the ten key 201 or the function key 
202 operated in synchronization with the shoot ing capturing 
operation of the user; (2) user operation of putting a finger or 
a pen onto a portion corresponding to a desired option button of 
a menu displayed on the touch panel of the liquid crystal display 
2 03; (3) user operation on the information switch which is 
provided on the body of the handy phone; or (4 ) user operation of 
touching or hand-writing on the liquid crystal monitor. 
Therefore, the user can easily input auxiliary information, and 
obtain index information (auxiliary information) when the 
shot captured moving picture is played back later. 

Also in this second embodiment, as in the first embodiment, 
whether auxiliary information should be selected or not may be 
selected by the user every time the user turns on the power 
button, or it may be set independently of turn-on of the power 
button . 

Further, auxiliary information may be generated in 
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synchronization with the shoot ing capturing button. When the 
handy phone is set such that the power is turned off manually or 
automatically for long battery life, in order to cope with this 
setting, auxiliary information may be generated in 
synchronization with power on/off. 

Further, as already described for the first embodiment, the 
descriptions of XML or the like are not limited to the above- 
described several kinds of parameters, and it is possible to 
select desired parameters from a menu according to the purpose. 
Further, although the second embodiment is described on the 
premise that a ohot captured picture is transmitted, a 
ohot captured picture is not necessarily transmitted, and it can 
be used also when compressed video and audio data are recorded as 
they are. Furthermore, although auxiliary information is 
generated at ohooting capturing , it is also possible to generate 
auxiliary information even at playback by using the information 
button at the time of playback, i.e., when playing a picture 
ohot captured by the handy phone itself of this second embodiment 
to confirm the picture. When recording or transmitting the 
generated auxiliary information, it is decided, according to the 
construction of the device or system, as to whether the auxiliary 
information should be recorded/transmitted after being 
multiplexed in the ohot captured video and audio data, or it 
should be stored in another place for recording/transmission so 
that the auxiliary information can be recorded/transmitted 
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independently of the ohot captured video and audio data. 

Furthermore, it is also possible to detect a section in the 
video data corresponding to a degree of importance specified by 
the user, and reproduce only this section by the handy phone 
itself to be displayed on the liquid phase display. Therefore, 
the CPU can extract only data having a high degree of importance 
from the video data recorded on the recording medium, and 
reproduce the extracted data, whereby confirmation of the 
recorded data by the monitor can be carried out with efficiency. 
Further, the user can enjoy the recorded "work" without feeling 
tired, and the power consumption is reduced to secure more 
driving time. 

Furthermore, it is possible to record a value of a viewpoint 
which has previously been determined, by using the information 
button. When the ten key is used for inputting the value of the 
viewpoint, it should be instructed in advance with the function 
key or the like. 
[Embodiment 3] 

Figurc Fig . 47 is a flowchart for explaining an example of an 
image data generation method according to a third embodiment of 
the present invention 7 — which corresponds to Claims 13 and 17 . It 
is assumed that the flowchart shown in f iguro f ig. 47 is executed 
by a control CPU which is embedded in a handy phone or the like. 

Figurc Fig . 47 shows an example of a method for extracting 
(moving) video and audio data to be transmitted so that video and 
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audio data as much as possible can be transmitted at a telephone 
charge lower than specified, when video and audio data to which 
auxiliary information generated by the auxiliary information 
generator according to the second embodiment is attached, is 
transmitted by a handy phone or the like. 

In f igurc f ig . 47, a destination and contents to be 
transmitted are selected in steps 41 and 42. Thereafter, a 
telephone charge is set in step 43, and a length L of contents 
which can be transmitted at the set telephone charge is 
calculated in step 44. Since video and audio contents are 
usually data-compressed, the length L corresponds to the length 
of the compressed data. However, the real time of video and 
audio can be easily obtained by converting the length L to the 
data size before compression. The video and audio data is divided 
into sub-sections called "segments" according to the auxiliary 
information. Then, an initial value Pr of priority is set in 
step 45, and a priority of a target segment is calculated in step 
46. Thereafter, in step 47, the calculated priority is compared 
with the initial value Pr by utilizing priority information which 
is included in the auxiliary information for each segment. For 
example, in f iguro f ig . 3, in a description of PointOfView 
(viewpoint description) , a value where ViewPoint="exciting" is 
extracted and compared with the Pr. Although in this example the 
priority is calculated from the value of one description, when 
there arc plural is a plurality of descriptions of priority values, 
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the corresponding priorities are derived by using a predetermined 
conversion expression, and a representative priority is 
determined and, thereafter, the representative priority is 
compared with the Pr. When the derived priority of the target 
segment is larger than the set value Pr, this segment is selected 
in step 48. When it is judged that the above-mentioned steps 
have been completed (step 49) and that the length of the selected 
segment is shorter than the set data length L (step 491), it is 
confirmed that at least one segment is selected (step 493) to end 
the process. 

On the other hand, when it is judged in step 4 91 that the 
total of the lengths of the selected segments is longer than the 
data length L, the priority set value Pr is incremented in step 
4 92, and the same operation as mentioned above is repeated. For 
example, in the case where the total of the lengths of the 
segments, which are selected when the priority set value Pr is 
11 0.5", is longer than the data length L which can be transmitted 
at the predetermined telephone charge, an increment "0.1" is 
added to the priority set value Pr to make it "0.6", whereby the 
number of segments to be selected is reduced. This operation is 
repeated until the total of the segment lengths falls within the 
data length L which can be transmitted at the predetermined 
telephone charge. In this way, the priority set value Pr is 
increased in predetermined increments such as "0.1", and a 
priority set value Pr, at which the total of the segment lengths 
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becomes lower than the data length L, is detected. Thereby, the 
total of the segment lengths falls within the predetermined data 
length L, and only the data having a high degree of importance 
can be collected. 

Since the above-mentioned processes are carried out using 
the auxiliary information, the video and audio data are not 
directly handled. Therefore, the processing load falls within a 
sufficiently allowable range. 

As described above, in the third embodiment of the invention, 
according to an upper limit of a telephone charge that is set by 
the user, an allowable calling time is determined within this 
charge. Then, a priority level is set, and the priority level is 
varied so that the total of segments whose priorities are higher 
than the set priority, approaches, as close as possible, a time 
whose upper limit is the calling time. Therefore, only important 
segments, i.e., important video and audio data, can be selected 
as many as possible within the range of the predetermined 
telephone charge, and these segments can be inputtcd input . 
[Embodiment 4] 

Figurc Fig . 48 is a flowchart for explaining an example of a 
video data generation method according to a fourth embodiment of 
the present inventionT — which corresponds to Claims 13 and 17 . It 
is assumed that the flowchart shown in f iqurc f ig . 48 is executed 
by a control CPU which is embedded in a handy phone or the like. 

Figurc Fig . 48 shows a video data generation method based on 



the premise that generated video data is attached to a mail. 
Initially, a mail address and a title are set in step 51 and 52, 
respectively. Thereafter, in step 53, information relating to 
the preference of a person to which the mail is directed (for 
example, a description of UserPref erence in MPEG-7), which 
information is stored in the handy phone, is extracted from the 
data base according to the mail address, and a priority Py is set 
in step 54. Simultaneously, a keyword is extracted from the 
title in step 55. Next, in steps 56, 57, 58, and 59, a segment 
whose priority is higher than the Py or a segment including a 
keyword in the description of Viewpoint or the title of 
VideoSegment is selected from the selected contents. When, in 
step 591, it is judged that all of the segments have been 
subjected to the checks in steps 56 and 58, only a part relating 
to the title or the preference of the receiver of the mail is 
attached to the mail to be transmitted. 

As described above, according to the fourth embodiment, the 
taste or preference of the receiver is decided according to the 
mail address of the receiver, and a degree of importance is 
decided from the taste or preference, and segments whose degrees 
of importance are higher than the decided degree of importance 
are collected to be transmitted to the receiver. Therefore, only 
an important part of the contents can be transmitted, whereby the 
telephone charge can be reduced at both the transmitter and the 
receiver . 
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While in f iguro f ig. 4 8 segments to be transmitted are 
selected according to the address and the title, segments to be 
transmitted may be selected from the keyword in the contents of 
the mail document, or the frequency of occurrence of the keyword. 
Further, although in f iguro f ig . 48 the data length is not limited, 
when f igurc f ig . 48 is combined with f iguro f ig . 40, further 
reduction in telephone charge can be achieved. 

While in the third and fourth embodiments segments in 
contents are selected, the present invention is not restricted 
thereto. The present invention is also applicable to the case 
where desired contents are selected from plural a plurality 
of contents or from all of already-recorded contents. 

Furthermore, although the priority of each segment is 
calculated using the degree of importance or preference, the 
present invention is not restricted thereto. For example, 
information about the capability of the terminal at destination 
or other information such as length, title, and the like can also 
be used by converting it into the priority. 

Furthermore, although the user stores the preferences of 
mail receivers in the database in the handy phone, when such 
database exists on the network, the user need not have the 
database but can access the external database as necessary. 

Furthermore, it is possible to constitute a database by 
attaching data of your preference or data of the capability of a 
terminal at your end to a mail when transmitting the mail. 
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Furthermore, while the third and fourth embodiments are 
described for the case where video and audio data are transmitted, 
the present invention is also applicable to the case where video 
and audio data having a predetermined length are recorded on a 
recording medium. 

Furthermore, the auxiliary information generation apparatus 
according to any of the first to fourth embodiments can be 
implemented by a computer as shown in f igurc f ig . 49. Figuro Fig . 
4 9 is a diagram illustrating a recording medium 61 on which 
computer program and data are recorded, and a computer system 62. 
It is assumed that the recording medium 61 is a semiconductor 
memory card. The procedure shown in f igurc f ig . 2, 47, or 48 is 
implemented by a program, and the program is recorded on the 
recording medium 61, whereby the program can be ported to the 
computer system 62 to be executed. Further, the same effects as 
those achieved by the aforementioned embodiments can be obtained 
by writing and reading the data itself in/from the recording 
medium. 

While in the aforementioned embodiments a video tape and a 
semiconductor memory are used as data recoding media, a floppy 
disk or an optical disk such as CD-R, CD-RW, MO, MD, or DVD may 
be employed as long as it has a sufficient capacity. 

Furthermore, while in the first and second embodiments a 
combined camera and digital VTR is taken as an example, a 
portable VTR or stationary VTR having a separated camera may be 
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employed . 

Moreover, while in the third and fourth embodiments a handy 
phone with a camera function is taken as an example, a PDA 
(Persona Digital Assistants) or a portable game machine may be 
employed as long as it is provided with a camera function or a 
camera can be connected to it. 



ABSTRACT 



Auxiliary information relating to digital data such as video 
and audio data is generated in synchronization with power-on, or 
it io generated according to an instruction from the user 
operating an information button or the like. Further, only an 
important portion of digital data is extracted by using the 
auxiliary information to generate reduced digital data, whereby 
the time and cost required for transmission e^e— is reduced . 



